Bayesian Learning at the Syntax-Semantics Interface

نویسنده

Sourabh Niyogi

چکیده

Given a small number of examples of sceneutterance pairs of a novel verb, language learners can learn its syntactic and semantic features. Syntactic and semantic bootstrapping hypotheses both rely on cross-situational observation to hone in on the ambiguity present in a single observation. In this paper, we cast the distributional evidence from scenes and syntax in a unified Bayesian probablistic framework. Unlike previous approaches to modeling lexical acquisition, our framework uniquely: (1) models learning from only a small number of sceneutterance pairs (2) utilizes and integrates both syntax and semantic evidence, thus reconciling the apparent tension between syntactic and semantic bootststrapping approaches (3) robustly handles noise (4) makes prior and acquired knowledge distinctions explicit, through specification of the hypothesis space, prior and likelihood probability distributions. Learning Word Syntax and Semantics Given a small number of examples of scene-utterance pairs of a novel word, a child can determine both the range of syntactic constructions the novel word can appear in and inductively generalize to other scene instances likely to be covered by the concept represented (Pinker 1989). The inherent semantic, syntactic, and referential uncertainty in a single sceneutterance pair is well-established (c.f. Siskind 1996). In contrast, with multiple scene-utterance pairs, language learners can reduce the uncertainty of which semantic features and syntactic features are associated with a novel word. Verbs exemplify the core problems of sceneutterance referential uncertainty. Verbs selectively participate in different alternation patterns, which are cues to their inherent semantic and syntactic features (Levin 1993). How are these features of words acquired, given only positive evidence of scene-utterance pairs? The syntactic bootstrapping hypothesis (Gleitman 1990) is that learners exploit the distribution of “syntactic frames” to constrain possible semantic features of verbs. If a learner hears /glip/ in frames of the form /S glipped G with F/ and rarely hears /S glipped F into G/, the learner can with high confidence infer /glip/ to be in the same verb class as /fill/ and have the same sort of argument structure. A different distribution informs the learner of a different verb class. Considerable evidence has mounted in support of this hypothesis (c.f. Naigles 1990, Fisher et al 1994). In contrast, the semantic bootstrapping hypothesis (Pinker 1989) is that learners use what is common across scenes to constrain the possible word argument structures. If a learner sees a liquid undergoing a location change when /S glipped F/ is uttered, then /glip/ is likely to be in the same verb class as /pour/ and have the same sort of meaning. Both hypotheses require the distribution of crosssituational observations. Prior accounts to model word learning have either ignored the essential role of syntax in word learning (Siskind 1996, Tenenbaum and Xu 2000), or require thousands of training observations (Regier et al 2001) to enable learning. In this paper we present a Bayesian model of learning the syntax and semantics of verbs that overcomes these barriers, by demonstrating how word-concept mappings can be achieved from very little evidence, where the evidence is information from both scenes and syntax. Bayesian Learning of Features We illustrate our approach with a Bayesian analysis of a single feature. On some accounts, verbs possess a cause feature which may be valued 1, *, or 0 (Harley and Noyer 2000); depending on the value of the cause feature, the verb may appear in frame F1, F0, or both: 1 Externally caused Ex: touch, load F1: He touched the glass. F0: *The glass touched. * Externally causable Ex: break, fill F1: He broke the glass. F0: The glass broke. 0 Internally caused Ex: laugh, glow F1: *He laughed the children. F0: The children laughed. Assuming this analysis, learners who hear utterances containing a novel verb, not knowing the value of its cause feature, must choose between 3 distinct hypotheses H1, H∗, and H0. Clearly, one utterance cannot uniquely determine the value of the feature: if learners hear F1 (/S Ved O/), the feature supports H1 or H∗; similarly, if learners hear F0 (/O Ved/), the feature may be H0 or H∗. Two utterances cannot determine the feature uniquely either. Learners might receive both F1 and F0, supporting H∗ uniquely. But they may also accidentally receive 2 utterances of the same form (F0, F0 or F1, F1), thus not resolving the ambiguity. If learners received 6 utterances of the same form F0 or F1, however, then there is overwhelming support for H0 or H1 respectively, and H∗ seems far less likely. A Bayesian analysis renders the above analysis precise and quantitative. Knowledge is encoded in three core components: (1) the structure of the hypothesis space H; (2) the prior probability p(Hi) on each hypothesis Hi in H, before learners are provided any evidence; (3) the likelihood of observing evidence X given a particular Hi, p(X|Hi). Given evidence X = [x1, . . . , xN ] of N independent observations, by Bayes’ rule the posterior probability of a particular hypothesis Hi is: p(Hi|X) = ∏N j=1 p(xj |Hi)p(Hi) p(x1, . . . , xN ) (1) signaling the support for a particular hypothesis Hi given evidence X. In this case, xj is the observation of a syntactic frame (F0 or F1), and X is a distribution of syntactic frames. One simple prior probability model p(Hi) has each of the 3 hypotheses are equally likely, encoding that a verb is equally likely to be of the /touch/, /laugh/ or /break/ class: p(H1) = p(H∗) = p(H0) = 1 3 (2) and a likelihood model p(xj |Hi) encoding how likely we are to observe frames F0 or F1 for the 3 different feature values of cause: p(xj = F1|H1) = .95 p(xj = F0|H1) = .05 p(xj = F1|H∗) = .50 p(xj = F0|H∗) = .50 (3) p(xj = F1|H0) = .05 p(xj = F0|H0) = .95 The above likelihood model says that when a verb has cause=1, we expect frames of the form /S Ved O/ 95% of the time; when a verb has cause=0, we expect /O Ved/ 95% of the time; when a verb has cause=*, we expect both syntactic frames. Both the prior probability model and likelihood model are stipulated, encoding a learner’s prior knowledge of grammar. Given these probability models, this allows for explicit computation of the support of each hypothesis. Suppose a learner receives F0. Then the support for each of the 3 hypotheses may be computed to be: p(H1|F0) = (.05)(.33) (.05 + .50 + .95)(.33) = .033 p(H∗|F0) = (.50)(.33) (.05 + .50 + .95)(.33) = .333 (4) p(H0|F0) = (.95)(.33) (.05 + .50 + .95)(.33) = .633 Any number of situations may be analyzed as such: Evidence X p(H1|X) p(H∗|X) p(H0|X) 1 F0 .033 .333 .633 2 F0, F0 .002 .216 .781 3 F0, F0, F0, F0, F0, F0 2e-8 .021 .979 4 F0, F1 .137 .724 .137 5 F0, F1, F0, F1, F0, F1 .007 .986 .007 6 F0, F1, F1, F1, F1, F1 .712 .288 5e-6 When only F0 is given as evidence (situation 1), while both H0 and H∗ are consistent with the observation, H0 is nearly twice as likely. However, with 2 observations of F0 (situation 2) or 6 observations (situation 3), it is increasingly likely that H0 is the correct hypothesis. With both F0 and F1 as evidence (situation 4), in contrast, H∗ is far more likely; with more evidence (situation 5), it becomes more so. Finally, if the first frame is a “noise” frame and followed by 5 representative frames of F1 (situation 6), then H1 is more likely instead. Given this framework, just one or two observations is sufficient to make an informed judgement. Note that each additional observation increases certainty, and noise is handled gracefully. Modeling Semantic Bootstrapping In this section, we extend the single feature analysis to multiple features, where each feature represents information from scenes (from any modality, whether perceptual, mental, etc.). Setting aside verbal aspect, we may model possible verb meanings as a set of M features, where each feature represents a predicate on one or more of the arguments of the verb. For example, a set of single argument predicates might include: moving(x), rotating(x), movingdown(x), supported(x), liquid(x), container(x) specifying the perceived situation about the argument of the verb (e.g. if it is moving, or moving in a particular manner, etc.) while a second set of twoargument predicates might specify the relationships between arguments, given that this is an externally caused (cause=1) event: contact(x, y), support(x, y), attach(x, y) Using these predicates, an idealized (partial) lexicon might contain the following word-concept mappings: cause One arg x Two arg x, y /lower/ 1 1*11** 11* /raise/ 1 1*01** 11* /rise/ 0 1*0*** /fall/ 0 1*1*** specifying, in linear order, the value of each of the one and two-argument predicates above, e.g. that /lower/ has cause=1, moving(x)=1, rotate(x)=*, movingdown(x)=1, etc. – and thus its concept covers externally-cased motion events where an agent moves a theme downwards through supported contact. The verb /raise/ is nearly identical except it has movingdown(x)=0, while /fall/ and /rise/ involve internally-caused motion (cause=0) and do not specify any two argument predicates. The values of * for the 4 rotating(x), liquid(x), container(x), and attach(x,y) predicates signal that these features are irrelevant to the verb’s concept. Perception of a scene amounts to evaluating these predicates; scenes may or may not fall under the verb concept, conditioned on the values of these predicates. The presence of q of “irrelevant” features valued as * implies 2 possible scenes consistent with the concept. Given a hypothesis space of possible verb concepts formed by M of these sorts of predicates, the task of learning a verb’s meaning given N observations X = [x1 . . . xN ] of scenes, is to determine which of the 3 possible concepts is the most likely. Just as before, a Bayesian model does so by computing the posterior probability distribution p(Hi|X) over concepts, given a prior distribution on hypotheses p(Hi) and a likelihood distribution of generating a particular xj example given Hi: p(xj |Hi) = { 1 2q if xj ∈ Hi 0 otherwise ; p(Hi) = 1

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cross-linguistic Influence at Syntax-pragmatics Interface: A Case of OPC in Persian

Recent research in the area of Second Language Acquisition has proposed that bilinguals and L2 learners show syntactic indeterminacy when syntactic properties interface with other cognitive domains. Most of the research in this area has focused on the pragmatic use of syntactic properties while the investigation of compliance with a grammatical rule at syntax-related interfaces has not received...

متن کامل

Reverse Engineering of Network Software Binary Codes for Identification of Syntax and Semantics of Protocol Messages

Reverse engineering of network applications especially from the security point of view is of high importance and interest. Many network applications use proprietary protocols which specifications are not publicly available. Reverse engineering of such applications could provide us with vital information to understand their embedded unknown protocols. This could facilitate many tasks including d...

متن کامل

Intelligent Interface Learning with Uncertainty

This paper presents an intelligent user interface agent architecture based on Bayesian networks. Using a Bayesian network knowledge representation not only dynamically captures and models user behavior, but it also dynamically captures and models uncertainty in the interface’s reasoning process. Bayesian networks’ sound semantics and mathematical basis enhances it’s ability to make correct, int...

متن کامل

Case at the Syntax/Semantics Interface

For most theories of semantics cases are irrelevant at best. For these theories the ideal language has fully regimented word order but no morphology. This means that cases have to disappear at the interface from syntax to semantics. The role of syntax is among other to provide a structural substitute for cases. This is also the philosophy of the Minimalist Program, where cases are treated as un...

متن کامل

Grammar at the Borderline: A Case Study of P as a Lexical Category

What is an interface? Talk of interfaces arises when one is confronted with some boundary phenomenon that sits at a point of contact between two domains. In my area of study, namely syntax, people talk about the “syntax-phonology interface”, the “syntax-morphology interface”, or the “syntax-semantics interface”. So an interface can be understood as a borderline between two domains. This metapho...

متن کامل

Branching Split Obliqueness at the Syntax-Semantics Interface

In this paper it is argued that the accuracy of the syntax-semantics interface is improved by adopting a non-linear obliqueness hierarchy for subcategorized arguments.

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2002

Bayesian Learning at the Syntax-Semantics Interface

نویسنده

چکیده

منابع مشابه

Cross-linguistic Influence at Syntax-pragmatics Interface: A Case of OPC in Persian

Reverse Engineering of Network Software Binary Codes for Identification of Syntax and Semantics of Protocol Messages

Intelligent Interface Learning with Uncertainty

Case at the Syntax/Semantics Interface

Grammar at the Borderline: A Case Study of P as a Lexical Category

Branching Split Obliqueness at the Syntax-Semantics Interface

عنوان ژورنال:

اشتراک گذاری